Privacy Controls for Data Flow

Thursday, May 14, 2020

Privacy controls for data flow is a technology that applies privacy controls to data as it moves between systems and organizations. It provides a descriptive method for implementing constraints imposed by regulations, privacy policies, and information sharing agreements to streamline data flow. It can also apply user consent information to mask or remove parts of data based on user data sharing preferences. Privacy controls for data flow works with standard or customized JSON and XML schemas. Because of this, it can be used in many data processing and exchange scenarios, including:

Data analytics applications,
Personal data management applications,
Financial applications,
Healthcare data exchange (using FHIR, HL7, Open mHealth schemas),
Health and wellness applications,
Medical research data sharing.

Privacy controls for data flow applies privacy policies to data flow. Privacy policies are represented as contextual views of personal data. A policy specifies the parts of personal data that can be used for a given purpose, or exchanged with a third party. Privacy controls for data flow can be used to implement such privacy policies to sanitize data as it is loaded to an analytics database, or as it is exchanged with third parties. It also integrates user consent, so user records without appropriate consent can be eliminated from the data sets, or masked based on granular user choices.

Privacy controls for data flow implements the privacy layer for JSON and XML schemas using overlays. Standardized schemas play a crucial role for interoperability. When data conforms to a schema it can be easily exchanged and correctly interpreted by multiple parties. A schema is a machine-readable specification that describes the structure, and to some extent, the meaning of data. JSON and XML schemas are widely used in the industry to validate data during processing and exchange. However, schemas do not include a built-in mechanism to apply privacy concerns to data. Privacy controls for data flow adds a privacy layer for standard schemas such as FHIR, or custom schemas used among partnering organizations.

Separating privacy concerns from schemas using overlays has many benefits. ConsentGrid uses metadata added by overlays to filter data fields based on user consent. This can be used to remove certain sensitive information when exchanging data with third parties. Multiple overlays can be used to compose views of data annotated and transformed for different use-cases. For example, a mobile application collecting personal data can apply one set of overlays to remove identifiable information when sending data to a public service but use a pseudonymization overlay when performing analytics. Shared overlays can be used to sanitize data when multiple parties feed data into a common data pool.

Privacy controls for data flow uses entity relationships and data structures from JSON and XML schemas. Domain specific standard schemas and custom schemas usually define numerous interconnected data objects. Privacy controls for data flow uses these data objects and entity relationships extracted from schema as the basis for overlays.

Overlays define additional processing steps for an entity. An overlay contains pointers to fields in a JSON or XML document along with processing directives. The structure of these processing directives depends on the type of the overlay.

Labeling overlays adds metadata that classify data fields to different privacy categories. This metadata is not directly stored in the document, but kept as a separate layer of information that can be used by other overlays. The following labeling overlay for a hypothetical Person entity adds PII label to firstName and lastName fields, and PII and LOC labels to all the elements of the address field:

{
    "overlayType": "label",
    "spec": {
        "firstName": [ "PII" ],
        "lastName": [ "PII" ],
        "address.*": [ "PII", "LOC" ]
    }
}

These labels can be linked with granular consent scopes, and additional overlays can remove these fields if user consent is lacking for those labels.

Hashing overlays can be used in the pseudonymization of data. Below is an example hashing overlay for the same Person entity. When processed, it will calculate a SHA256 hash using the firstName and lastName fields, insert a hash field into the document, and remove the firstName and lastName fields.

{ 
    "overlayType": "hash", 
    "spec": {
        "hash": {
           "algorithm": "SHA256", 
           "sources": [ 
               "firstName": { "remove": true  },
               "lastName": { "remove": true  }
           ] 
       } 
    }
}

Masking overlays hide data by removing fields from the document, or by modifying field values. Specialized masking overlays are used to filter data fields based on the privacy labels added using labeling overlays and active user consent.

{
    "overlayType": "mask",
    "spec": {
      "lastName": { "replace": {"with":"redacted"}},
      "firstName": { "mask": {"leaveAtMostN":2,"fill":"***"}}
    }
}

Even though these overlays are defined as JSON documents, they can be used to process both JSON and XML documents.

Multiple overlays can be stacked to compose different views of an entity. For example, one view can have several labeling overlays each using different labeling criteria, and then consent-based filtering can be used to remove fields based on the user consent. Another view can be used for pseudonymization during data transfer to a research data store.

If we process a hypothetical Person entity with the labeling overlay, it adds the following labels to the document:

{
  "firstName": "Test", [PII]
  "lastName": "User",  [PII]
  "birthDate": "2001-09-01",
  "address": {                    
    "streetAddress": "12 Main St", [PII,LOC]
    "city": "Anycity",             [PII,LOC]
    "state": "CA"                  [PII,LOC]
  }
}

If the view uses an overlay that checks consent for data labeled with LOC and if the data subject does not give consent, the resulting data becomes:

{
  "firstName": "Test",
  "lastName": "User", 
  "birthDate": "2001-09-01",
  "address": {}
}

The same document processed by the hashing overlay looks like this:

{
  "hash": "e3b0c...",
  "birthDate": "2001-09-01",
  "address": {
    "streetAddress": "123 Main St",
    "city": "Anycity",
    "state": "CA"
  }
}

You can see a demonstration at this technology at the Privacy Controls for Data Flow Demo page.